Concatenates two or more strings into one. Non-string arguments must be explicitly converted to string using the TO_STRING()
function.
CONCAT(TO_STRING(float_attr), ',', TO_STRING(int_attr), ',', title)
LEVENSHTEIN ( pattern, source, {normalize=0, length_delta=0})
returns number (Levenshtein distance) of single-character edits (insertions, deletions or substitutions) between pattern
and source
strings required to make in pattern
to make it source
.
pattern
,source
- constant string, string field name, JSON field name, or any expression that produces a string (like e.g., SUBSTRING_INDEX())normalize
- option to return the distance as a float number in the range[0.0 - 1.0]
, where 0.0 is an exact match, and 1.0 is the maximum difference. The default value is 0, meaning not to normalize and provide the result as an integer.length_delta
- skips Levenshtein distance calculation and returnsmax(strlen(pattern), strlen(source))
if the option is set and the lengths of the strings differ by more than thelength_delta
value. The default value is 0, meaning to calculate Levenshtein distance for any input strings. This option can be useful when checking mostly similar strings.
SELECT LEVENSHTEIN('gily', attr1) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC, dist ASC;
SELECT LEVENSHTEIN('gily', j.name, {length_delta=6}) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC;
SELECT LEVENSHTEIN(title, j.name, {normalize=1}) AS dist, WEIGHT() AS w FROM test WHERE MATCH ('test') ORDER BY w DESC, dist ASC;
The REGEX(attr,expr)
function returns 1 if a regular expression matches the attribute's string, and 0 otherwise. It works with both string and JSON attributes.
SELECT REGEX(content, 'box?') FROM test;
SELECT REGEX(j.color, 'red | pink') FROM test;
Expressions should adhere to the RE2 syntax. To perform a case-insensitive search, for instance, you can use:
SELECT REGEX(content, '(?i)box') FROM test;
The SNIPPET()
function can be used to highlight search results within a given text. The first two arguments are: the text to be highlighted, and a query. Options can be passed to the function as the third, fourth, and so on arguments. SNIPPET()
can obtain the text for highlighting directly from the table. In this case, the first argument should be the field name:
SELECT SNIPPET(body, QUERY()) FROM myIndex WHERE MATCH('my.query')
In this example, the QUERY()
expression returns the current full-text query. SNIPPET()
can also highlight non-indexed text:
mysql SELECT id, SNIPPET('text to highlight', 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')
Additionally, it can be used to highlight text fetched from other sources using a User-Defined Function (UDF):
SELECT id, SNIPPET(myUdf(id), 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')
In this context, myUdf()
is a User-Defined Function (UDF) that retrieves a document by its ID from an external storage source. The SNIPPET()
function is considered a "post limit" function, which means that the computation of snippets is delayed until the entire final result set is prepared, and even after the LIMIT
clause has been applied. For instance, if a LIMIT 20,10
clause is used, SNIPPET()
will be called no more than 10 times.
It is important to note that SNIPPET()
does not support field-based limitations. For this functionality, use HIGHLIGHT() instead.
SUBSTRING_INDEX(string, delimiter, number)
returns a substring of the original string, based on a specified number of delimiter occurrences:
- string - The original string, which can be a constant string or a string from a string/JSON attribute.
- delimiter - The delimiter to search for.
- number - The number of times to search for the delimiter. This can be either a positive or negative number. If it is a positive number, the function will return everything to the left of the delimiter. If it is a negative number, the function will return everything to the right of the delimiter.
SUBSTRING_INDEX()
by default returns a string, but it can also be coerced into other types (such as integer or float) if necessary. Numeric values can be converted using specific functions (such as BIGINT()
, DOUBLE()
, etc.).
- SQL
SELECT SUBSTRING_INDEX('www.w3schools.com', '.', 2) FROM test;
SELECT SUBSTRING_INDEX(j.coord, ' ', 1) FROM test;
SELECT SUBSTRING_INDEX('1.2 3.4', ' ', 1); /* '1.2' */
SELECT SUBSTRING_INDEX('1.2 3.4', ' ', -1); /* '3.4' */
SELECT sint ( SUBSTRING_INDEX('1.2 3.4', ' ', 1)); /* 1 */
SELECT sint ( SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ', 1)); /* 1.200000 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3.400000 */
UPPER(string)
convert argument to upper case, LOWER(string)
convert argument to lower case.
Result also can be promoted to numeric, but only if string argument is convertible to a number. Numeric values could be promoted with arbitrary functions (BITINT
, DOUBLE
, etc.).
SELECT upper('www.w3schools.com', '.', 2); /* WWW.W3SCHOOLS.COM */
SELECT double (upper ('1.2e3')); /* 1200.000000 */
SELECT integer (lower ('12345')); /* 12345 */
Returns the current connection ID.
mysql> select CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
| 6 |
+-----------------+
1 row in set (0.00 sec)
Returns KNN vector search distance.
mysql> select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926) ) and match('white') and id < 10;
+------+------------+
| id | knn_dist() |
+------+------------+
| 2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)
Returns the IDs of documents that were inserted or replaced by the last statement in the current session.
The same value can also be obtained via the @@session.last_insert_id
variable.
mysql> select @@session.last_insert_id;
+--------------------------+
| @@session.last_insert_id |
+--------------------------+
| 11,32 |
+--------------------------+
1 rows in set
mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 25,26,29 |
+------------------+
1 rows in set
Returns a "short" universal identifier as a 63-bit unsigned integer following the same algorithm as for auto-id generation.
NOTE: Using
uuid_short()
will increase the counter used for auto-IDs, so use it carefully. While you can runuuid_short()
in queries likeSELECT uuid_short() FROM ...
, this approach isn't ideal as it may create large gaps in your auto-generated IDs.
mysql> select uuid_short();
+---------------------+
| uuid_short() |
+---------------------+
| 5839598169280741377 |
+---------------------+
1 row in set (0.00 sec)
mysql> select uuid_short();
+---------------------+
| uuid_short() |
+---------------------+
| 5839598169280741378 |
+---------------------+
1 row in set (0.00 sec)
⪢ Securing and compacting a table
Backing up your tables on a regular basis is essential for recovery in the event of system crashes, hardware failure, or data corruption/loss. It's also highly recommended to make backups before upgrading to a new Manticore Search version or running ALTER TABLE.
Backing up database systems can be done in two unique ways: logical and physical backups. Each of these methods has its pros and cons, which may vary based on the specific database environment and needs. Here, we'll delve into the distinction between these two types of backups.
Logical backups entail exporting the database schema and data as SQL statements or as data formats specific to the database. This backup form is typically readable by humans and can be employed to restore the database on various systems or database engines.
Pros and cons of logical backups:
- ➕ Portability: Logical backups are generally more portable than physical backups, as they can be used to restore the database on different hardware or operating systems.
- ➕ Flexibility: Logical backups allow you to selectively restore specific tables, indexes, or other database objects.
- ➕ Compatibility: Logical backups can be used to migrate data between different database management systems or versions, provided the target system supports the exported format or SQL statements.
- ➖ Slower Backup and Restore: Logical backups can be slower than physical backups, as they require the database engine to convert the data into SQL statements or another export format.
- ➖ Increased System Load: Creating logical backups can cause higher system load, as the process requires more CPU and memory resources to process and export the data.
Manticore Search supports mysqldump for logical backups.
Physical backups involve copying the raw data files and system files that comprise the database. This type of backup essentially creates a snapshot of the database's physical state at a given point in time.
Pros and cons of physical backups:
- ➕ Speed: Physical backups are usually faster than logical backups, as they involve copying raw data files directly from disk.
- ➕ Consistency: Physical backups ensure a consistent backup of the entire database, as all related files are copied together.
- ➕ Lower System Load: Creating physical backups generally places less load on the system compared to logical backups, as the process does not involve additional data processing.
- ➖ Portability: Physical backups are typically less portable than logical backups, as they may be dependent on the specific hardware, operating system, or database engine configuration.
- ➖ Flexibility: Physical backups do not allow for the selective restoration of specific database objects, as the backup contains the entire database's raw files.
- ➖ Compatibility: Physical backups cannot be used to migrate data between different database management systems or versions, as the raw data files may not be compatible across different platforms or software.
Manticore Search has manticore-backup command line tool for physical backups.
In summary, logical backups provide more flexibility, portability, and compatibility but can be slower and more resource-intensive, while physical backups are faster, more consistent, and less resource-intensive but may be limited in terms of portability and flexibility. The choice between these two backup methods will depend on your specific database environment, hardware, and requirements.
The manticore-backup
tool, included in the official Manticore Search packages, automates the process of backing up tables for an instance running in RT mode.
If you followed the official installation instructions, you should already have everything installed and don't need to worry. Otherwise, manticore-backup
requires PHP 8.1.10 and specific modules or manticore-executor
, which is a part of the manticore-extra
package, and you need to ensure that one of these is available.
Note that manticore-backup
is not available for Windows yet.
First, make sure you're running manticore-backup
on the same server where the Manticore instance you are about to back up is running.
Second, we recommend running the tool under the root
user so the tool can transfer ownership of the files you are backing up. Otherwise, a backup will be also made but with no ownership transfer. In either case, you should make sure that manticore-backup
has access to the data dir of the Manticore instance.
The only required argument for manticore-backup
is --backup-dir
, which specifies the destination for the backup. If you don't provide any additional arguments, manticore-backup
will:
- locate a Manticore instance running with the default configuration
- create a subdirectory in the
--backup-dir
directory with a timestamped name - backup all tables found in the instance
- Example
manticore-backup --config=path/to/manticore.conf --backup-dir=backupdir
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)
Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: all tables
Target dir: /mnt/backup/
Manticore config
endpoint = 127.0.0.1:9308
Manticore versions:
manticore: 5.0.2
columnar: 1.15.4
secondary: 1.15.4
2022-10-04 17:18:39 [Info] Starting the backup...
2022-10-04 17:18:39 [Info] Backing up config files...
2022-10-04 17:18:39 [Info] config files - OK
2022-10-04 17:18:39 [Info] Backing up tables...
2022-10-04 17:18:39 [Info] pq (percolate) [425B]...
2022-10-04 17:18:39 [Info] OK
2022-10-04 17:18:39 [Info] products (rt) [512B]...
2022-10-04 17:18:39 [Info] OK
2022-10-04 17:18:39 [Info] Running sync
2022-10-04 17:18:42 [Info] OK
2022-10-04 17:18:42 [Info] You can find backup here: /mnt/backup/backup-20221004171839
2022-10-04 17:18:42 [Info] Elapsed time: 2.76s
2022-10-04 17:18:42 [Info] Done
To back up specific tables only, use the --tables
flag followed by a comma-separated list of tables, for example --tables=tbl1,tbl2
. This will only backup the specified tables and ignore the rest.
- Example
manticore-backup --backup-dir=/mnt/backup/ --tables=products
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)
Manticore config file: /etc/manticoresearch/manticore.conf
Tables to backup: products
Target dir: /mnt/backup/
Manticore config
endpoint = 127.0.0.1:9308
Manticore versions:
manticore: 5.0.3
columnar: 1.16.1
secondary: 0.0.0
2022-10-04 17:25:02 [Info] Starting the backup...
2022-10-04 17:25:02 [Info] Backing up config files...
2022-10-04 17:25:02 [Info] config files - OK
2022-10-04 17:25:02 [Info] Backing up tables...
2022-10-04 17:25:02 [Info] products (rt) [512B]...
2022-10-04 17:25:02 [Info] OK
2022-10-04 17:25:02 [Info] Running sync
2022-10-04 17:25:06 [Info] OK
2022-10-04 17:25:06 [Info] You can find backup here: /mnt/backup/backup-20221004172502
2022-10-04 17:25:06 [Info] Elapsed time: 4.82s
2022-10-04 17:25:06 [Info] Done
Argument | Description |
---|---|
--backup-dir=path |
This is the path to the backup directory where the backup will be stored. The directory must already exist. This argument is required and has no default value. On each backup run, manticore-backup will create a subdirectory in the provided directory with a timestamp in the name (backup-[datetime] ), and will copy all required tables to it. So the --backup-dir is a container for all your backups, and it's safe to run the script multiple times. |
--restore[=backup] |
Restore from --backup-dir . Just --restore lists available backups. --restore=backup will restore from <--backup-dir>/backup . |
--force |
Skip versions check on restore and gracefully restore the backup. |
--disable-telemetry |
Pass this flag in case you want to disable sending anonymized metrics to Manticore. You can also use environment variable TELEMETRY=0 |
--config=/path/to/manticore.conf |
Path to the Manticore configuration. Optional. If not provided, a default configuration for your operating system will be used. Used to determine the host and port for communication with the Manticore daemon. The manticore-backup tool supports dynamic configuration files. You can specify the --config option multiple times if your configuration is spread across multiple files. |
--tables=tbl1,tbl2, ... |
Semicolon-separated list of tables that you want to back up. To back up all tables, omit this argument. All the provided tables must exist in the Manticore instance you are backing up from, or the backup will fail. |
--compress |
Whether the backed up files should be compressed. Not enabled by default. |
--unlock |
In rare cases when something goes wrong, tables can be left in a locked state. Use this argument to unlock them. |
--version |
Show the current version. |
--help |
Show this help. |
You can also back up your data through SQL by running the simple command BACKUP TO /path/to/backup
.
NOTE:
BACKUP
is not supported in Windows. Consider using mysqldump instead.
NOTE:
BACKUP
requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.
BACKUP
[{TABLE | TABLES} a[, b]]
[{OPTION | OPTIONS}
async = {on | off | 1 | 0 | true | false | yes | no}
[, compress = {on | off | 1 | 0 | true | false | yes | no}]
]
TO path_to_backup
For instance, to back up tables a
and b
to the /backup
directory, run the following command:
BACKUP TABLES a, b TO /backup
There are options available to control and adjust the backup process, such as:
async
: makes the backup non-blocking, allowing you to receive a response with the query ID immediately and run other queries while the backup is ongoing. The default value is0
.compress
: enables file compression using zstd. The default value is0
. For example, to run a backup of all tables in async mode with compression enabled to the/tmp
directory:
BACKUP OPTION async = yes, compress = yes TO /tmp
- The path should not contain special symbols or spaces, as they are not supported.
- Ensure that Manticore Buddy is launched (it is by default).
To ensure consistency of tables during backup, Manticore Search's backup tools use the innovative FREEZE and UNFREEZE commands. Unlike the traditional lock and unlock tables feature of e.g. MySQL, FREEZE
stops flushing data to disk while still permitting writing (to some extent) and selecting updated data from the table.
However, if your RAM chunk size grows beyond the rt_mem_limit
threshold during lengthy backup operations involving many inserts, data may be flushed to disk, and write operations will be blocked until flushing is complete. Despite this, the tool maintains a balance between table locking, data consistency, and database write availability while the table is frozen.
When you use manticore-backup
or the SQL BACKUP
command, the FREEZE
command is executed once and freezes all tables you are backing up simultaneously. The backup process subsequently backs up each table one by one, releasing the freeze after successfully backing up each table.
If backup fails or gets interrupted, the tool tries to unfreeze all the tables.
To restore a Manticore instance from a backup, use the manticore-backup
command with the --backup-dir
and --restore
arguments. For example: manticore-backup --backup-dir=/path/to/backups --restore
. If you don't provide any argument for --restore
, it will simply list all the backups in the --backup-dir
.
- Example
manticore-backup --backup-dir=/mnt/backup/ --restore
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)
Manticore config file:
Backup dir: /tmp/
Available backups: 3
backup-20221006144635 (Oct 06 2022 14:46:35)
backup-20221006145233 (Oct 06 2022 14:52:33)
backup-20221007104044 (Oct 07 2022 10:40:44)
To start a restore job, run manticore-backup
with the flag --restore=backup name
, where backup name
is the name of the backup directory within the --backup-dir
. Note that:
- There can't be any Manticore instance running on the same host and port as the one being restored.
- The old
manticore.json
file must not exist. - The old configuration file must not exist.
- The old data directory must exist and be empty.
If all conditions are met, the restore will proceed. The tool will provide hints, so you don't have to memorize them. It's crucial to avoid overwriting existing files, so make sure to remove them prior to the restore if they still exist. Hence all the conditions.
- Example
manticore-backup --backup-dir=/mnt/backup/ --restore=backup-20221007104044
Copyright (c) 2023-2024, Manticore Software LTD (https://manticoresearch.com)
Manticore config file:
Backup dir: /tmp/
2022-10-07 11:17:25 [Info] Starting to restore...
Manticore config
endpoint = 127.0.0.1:9308
2022-10-07 11:17:25 [Info] Restoring config files...
2022-10-07 11:17:25 [Info] config files - OK
2022-10-07 11:17:25 [Info] Restoring state files...
2022-10-07 11:17:25 [Info] config files - OK
2022-10-07 11:17:25 [Info] Restoring data files...
2022-10-07 11:17:25 [Info] config files - OK
2022-10-07 11:17:25 [Info] The backup '/tmp/backup-20221007104044' was successfully restored.
2022-10-07 11:17:25 [Info] Elapsed time: 0.02s
2022-10-07 11:17:25 [Info] Done
NOTE: some versions of
mysqldump
/mariadb-dump
require Manticore Buddy. If the dump isn't working, make sure Buddy is installed.
To create a backup of your Manticore Search database, you can use the mysqldump
command. We will use the default port and host in the examples.
Note, mysqldump
is supported only for real-time tables.
- Basic
- Replace mode
- Replication mode
mysqldump -h0 -P9306 manticore > manticore_backup.sql
mariadb-dump -h0 -P9306 manticore > manticore_backup.sql
Executing this command will produce a backup file named manticore_backup.sql
. This file will hold all data and table schemas.
mysqldump -h0 -P9306 --replace --net-buffer-length=16m -etc manticore tbl > tbl.sql
This will produce a backup file tbl.sql
with replace
commands instead of insert
, with column names retained in each batch. Documents will be batched up to 16 megabytes large. There will be no drop
/create table
commands. This is useful for full-text reindexation after changing tokenization settings.
mysqldump -etc --replace -h0 -P9306 -ucluster manticore cluster:tbl | mysql -P9306 -h0
mariadb-dump -etc --replace -h0 -P9306 -ucluster manticore cluster:tbl | mysql -P9306 -h0
In this case, mysqldump
will generate commands like REPLACE INTO cluster:table ...
, which will be sent directly to the Manticore instance, resulting in the documents being reinserted.
Use the cluster
user and the -t
flag to enable replication mode. See the details in the notes below.
If you're looking to restore a Manticore Search database from a backup file, the mysql client is your tool of choice.
Note, if you are restoring in Plain mode, you cannot drop and recreate tables directly. Therefore, you should:
- Use
mysqldump
with the-t
option to excludeCREATE TABLE
statements from your backup. - Manually TRUNCATE the tables before proceeding with the restoration.
- SQL
mysql -h0 -P9306 < manticore_backup.sql
mariadb -h0 -P9306 < manticore_backup.sql
This command enables you to restore everything from the manticore_backup.sql
file.
Here are some more settings that can be used with mysqldump to tailor your backup:
-t
skipsdrop
/create
table commands. Useful for full-text reindexation of a table after changing tokenization settings.--no-data
: This setting omits table data from the backup, resulting in a backup file that consists only of table schemas.--ignore-table=[database_name].[table_name]
: This option allows you to bypass a particular table during the backup operation. Note that the database name must bemanticore
.--replace
to performreplace
instead ofinsert
. Useful for full-text reindexation of a table after changing tokenization settings.--net-buffer-length=16M
to make batches up to 16 megabytes large for faster restoration.-e
to batch up documents. Useful for faster restoration.-c
to keep column names. Useful for reindexation of a table after changing its schema (e.g., changing field order).
For a comprehensive list of settings and their thorough descriptions, kindly refer to the official MySQL documentation or MariaDB documentation.
- To create a dump in replication mode (where the dump includes
INSERT/REPLACE INTO <cluster_name>:<table_name>
):- Use the
cluster
user. For example:mysqldump -u cluster ...
ormariadb-dump -u cluster ...
. You can change the username that enables replication mode formysqldump
by runningSET GLOBAL cluster_user = new_name
. - Use the
-t
flag. - When specifying a table in replication mode, you need to follow the
cluster_name:table_name
syntax. For example:mysqldump -P9306 -h0 -t -ucluster manticore cluster:tbl
.
- Use the
- It's recommended to explicitly specify the
manticore
database when you plan to back up all databases, instead of using the--all-databases
option. - Note that
mysqldump
does not support backing up distributed tables and cannot back up tables containing non-stored fields. For such cases, consider usingmanticore-backup
or theBACKUP
SQL command. If you have distributed tables, it is recommended to always specify the tables to be dumped.